Goto

Collaborating Authors

 York






WhenDoFlatMinimaOptimizers Work?

Neural Information Processing Systems

Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.



FormulatingRobustnessAgainstUnforeseenAttacks

Neural Information Processing Systems

Our bound addresses the second question; it suggests that learning algorithms that bias towards models with small variation across the source threat model exhibit smaller drop in robustness to particularunforeseenattacks.